Large scale biodiversity patterns

Showing some neat features of R!
Author
Affiliation
Published

March 7, 2024

Note

Since the eighteenth century, broad-scale patterns of diversity called the attention of naturalists. Recognizing that tropical regions have higher species richness relative to temperate areas, Alexander von Humboldt was the first one to propose it to emerge from climatic differences (Hawkins, 2001). This ubiquitous pattern has since then been known as the Latitudinal Diversity Gradient (LDG) and, although the global distribution of biodiversity is indeed far more complex than a simple unidirectional gradient (Hawkins & Felizola Diniz-Filho, 2004), the difference in species richness between temperate and tropical regions tends to capture the most evident facet of the distribution of life on Earth: its geographic heterogeneity.

Early explanations for the LDG in the 1950s and 1960s followed von Humboldt’s tradition and focused on the strong correlations observed between diversity (i.e., species richness) and components of current environmental variation—especially combinations of temperature and precipitation (Brown, 2014; Hawkins et al., 2003; O’Brien, 2006; Pianka, 1966; Simpson, 1964). These high correlations suggested a causal explanation, and spurred the development of hypotheses that aimed to identify the mechanisms affecting species distributions and hence driving geographical patterns (Currie et al., 2004). Although these diversity-environment correlations suggested “pure ecological explanations” that involved population-level processes tied to dispersal and aggregation of tropical organisms, it quickly became clear that deep-time evolutionary processes should also be taken into account to explain the LDG (Ricklefs, 2004; Rohde, 1992). In fact, as early as 1937, Theodosius Dobzhansky had proposed that diversity gradients should be explained by an interaction between ecological and evolutionary mechanisms, in which evolution would drive the dimensions of the niche—the set of biotic and abiotic factors that allow a species to exist indefinitely—that would allow different patterns of niche packing throughout environmental gradients. Today, it is consensus that the LDG should be explained not only by current climatic factors, but also by the long-term dynamics of such climatic factors and by events happening throughout the evolution of the species (Fine, 2015).

Today we will learn basic tools in R for visualizing species distributions, build geographical ranges, testing drivers of gradients of biodiversity under different approaches.

You will need three datasets, that will be provided for you:

  1. Species occurrence data points – live.oaks.txt

  2. Species geograhical ranges – Furnarii_ranges_geo.shp

  3. Environmental predictors – bio1.bil and bio12.bil

1 Set up your data and your working directory

Set up a working directory and put the data files in that directory. Tell R that this is the directory you will be using, and read in your data:

Option 1

You can download the data directly on your computer by clicking Occurrences, Geographical Ranges, and Environmental predictors and store them in the folder named Data”.

Option 2

You can also go ahead and run the next lines if you don’t want to copy a paste the links provided above.

Code
main.dir <- getwd() # Will get the working directory

# Print main directory

main.dir

# create a Data folder
dir.create("Data")

urls <- "https://www.dropbox.com/s/36m77nus3taywjp/Archive.zip?dl=1" # Name of the file to download

download.file(url = urls, file.path(main.dir, "Data/Archive.zip"), mode = "wb") # download the file in a specific folder

unzip("Data/Archive.zip", exdir = "Data/")

To do this laboratory you will need to have a set of R packages. Install the following packages:

Code
packages <- c("tidyverse", "sf", "scico", "rnaturalearth", 
              "rnaturalearthdata", "smoothr", "sp", "viridis", 
              "terra", "spdep", "ncf", 
              "spatialreg", "rasterVis", "RColorBrewer") 
# Package vector names
Function install.packages()

You can use the function install.packages() to install the packages.

If you don’t want to install the packages one by one, you can use the next command.

Code
# Install packages not yet installed
installed_packages <- packages %in% rownames(installed.packages())

if (any(installed_packages == FALSE)) {
  install.packages(packages[!installed_packages], dependencies = TRUE)
}

This command, will, first, check if you already the packages installed, then if a package is not installed in your computer, will install it.

When installing multiple packages, please pay attention to the messages in your R console. In some cases, R will ask you if you want to install the source version. If that is the case, just type n and hit enter.

Load installed packages:

Code
library(scico)
library(rnaturalearth)
The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
which was just loaded, will retire in October 2023.
Please refer to R-spatial evolution reports for details, especially
https://r-spatial.org/r/2023/05/15/evolution4.html.
It may be desirable to make the sf package available;
package maintainers should consider adding sf to Suggests:.
The sp package is now running under evolution status 2
     (status 2 uses the sf package in place of rgdal)
Support for Spatial objects (`sp`) will be deprecated in {rnaturalearth} and will be removed in a future release of the package. Please use `sf` objects with {rnaturalearth}. For example: `ne_download(returnclass = 'sf')`
Code
library(smoothr)

Attaching package: 'smoothr'
The following object is masked from 'package:stats':

    smooth
Code
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Code
library(terra)
terra 1.7.39

Attaching package: 'terra'

The following object is masked from 'package:tidyr':

    extract

The following object is masked from 'package:smoothr':

    densify
Code
library(sf)
Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE

Double-check your working directory.

Function getwd()

You can use the function getwd() to get the current working directory.

2 From point occurrences to range maps

We will start setting the geographical extent of our study area and to do that we will use spatial data from the package {rnaturalearth}.

Code
sf_use_s2(FALSE)
Spherical geometry (s2) switched off
Code
# Get world map from the {rnaturalearthdata} package
worldMap <- ne_countries(scale = "medium", 
                         type = "countries", 
                         returnclass = "sf")

# cCountry subset - The United States and Mexico
NApoly <- worldMap %>% 
  filter(admin == "United States of America" | admin == "Mexico")

# trim to study area
limsNA <- st_buffer(NApoly, dist = 1) %>% 
  st_bbox() 
Warning in st_buffer.sfc(st_geometry(x), dist, nQuadSegs, endCapStyle =
endCapStyle, : st_buffer does not correctly buffer longitude/latitude data
dist is assumed to be in decimal degrees (arc_degrees).
Code
# neighboring countries
adjacentPolys <- st_touches(NApoly, worldMap)
although coordinates are longitude/latitude, st_touches assumes that they are
planar
Code
neighbours <- worldMap %>% 
  slice(pluck(adjacentPolys, 1))

We can plot the resulting map.

Code
ggplot() +
  geom_sf(data = neighbours, color = "white") +
  geom_sf(data = NApoly) +
  coord_sf(
    xlim = c(limsNA["xmin"], limsNA["xmax"]),
    ylim = c(limsNA["ymin"], limsNA["ymax"])
  ) +
  theme(
    plot.background = element_rect(fill = "#f1f2f3"),
    panel.background = element_rect(fill = "lightblue"),
    panel.grid = element_blank(),
    line = element_blank(),
    rect = element_blank()
  )

Hmmmm, that is somewhat ugly, let’s adjust the coordinates a bit the map and plot it again…

Code
ggplot() +
  geom_sf(data = neighbours, color = "white") +
  geom_sf(data = NApoly) +
  coord_sf(
     xlim = c(-125, -65),
    ylim = c(10, 50)
  ) +
  theme(
    plot.background = element_rect(fill = "#f1f2f3"),
    panel.background = element_rect(fill = "lightblue"),
    panel.grid = element_blank(),
    line = element_blank(),
    rect = element_blank()
  )

Much, much better!

Load species occurrences data points. We will use occurrences from Live oaks, that were obtained from iDigBio between 20 and 24 July 2018 by Jeannine Cavender-Bares. Notice that these occurrence data points were visually examined and any localities that were outside the known range of the species, or in unrealistic locations (e.g., water bodies, crop fields) were discarded.

Code
oaks_occ <- read_delim("Data/OCC/live.oaks.txt") %>% 
  filter(Species != "Hybrid") # removing hybrid observations
Rows: 672 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr (1): Species
dbl (2): Longitude, Latitude

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Code
oaks_occ %>% 
  count(Species) # check how many species and how many observations per species
# A tibble: 7 × 2
  Species                n
  <chr>              <int>
1 Quercus_brandegei     35
2 Quercus_fusiformis    76
3 Quercus_geminata      51
4 Quercus_minima        38
5 Quercus_oleoides     292
6 Quercus_sagraena      40
7 Quercus_virginiana   134
Code
# Visualize

oaks_occ %>% 
  count(Species) %>% 
  ggplot(aes(x = n)) + 
  geom_histogram() + 
  xlab("Number of observations")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Transform data.frame to spatial data.frame

Code
# to sf object, specifying variables with coordinates and projection
oaks_occ_sf <- st_as_sf(oaks_occ, coords = c("Longitude", "Latitude"), crs = 4326) %>%
  #group_by(species) %>%
  st_cast("MULTIPOINT") %>% 
  group_by(Species) %>% 
  summarize()
although coordinates are longitude/latitude, st_union assumes that they are
planar
Code
glimpse(oaks_occ_sf)
Rows: 7
Columns: 2
$ Species  <chr> "Quercus_brandegei", "Quercus_fusiformis", "Quercus_geminata"…
$ geometry <MULTIPOINT [°]> MULTIPOINT ((-110.1556 23.7..., MULTIPOINT ((-100.…

What variables we have in the oaks_occ object?

Answer: There are three variables, species, longitude and latitude.

How many oak species are in the dataset?

Answer: There are seven oak species.

Code
unique(oaks_occ$Species)
[1] "Quercus_virginiana" "Quercus_geminata"   "Quercus_minima"    
[4] "Quercus_fusiformis" "Quercus_oleoides"   "Quercus_brandegei" 
[7] "Quercus_sagraena"  

What we did in the previous code was simply to transform a the data.frame object into a spatial data.frame. We can plot the results.

Code
ggplot() +
  geom_sf(data = neighbours, color = "white") +
  geom_sf(data = NApoly) + 
  geom_sf(data = oaks_occ_sf, aes(color = Species), alpha = 0.7) + 
  coord_sf(
     xlim = c(-125, -65),
    ylim = c(10, 50)
  ) +
  theme(
    plot.background = element_rect(fill = "#f1f2f3"),
    panel.background = element_rect(fill = "lightblue"),
    panel.grid = element_blank(),
    line = element_blank(),
    rect = element_blank()
  )

Nice!

2.1 Range maps from point data

In this section we will learn how to create “simple” range maps based on geometry (e.g. minimum convex polygons, etc.), without considering environmental variables (e.g., ENMs or SDMs). Note that these range maps are geographical abstractions of the species ranges. In other words, a species range is the area where a particular species can be found during its lifetime. Species range includes areas where individuals or communities can migrate or hibernate

We will explore two alternative, one based on simple convex hull and the other is the smoothed convex hull

2.1.1 Convex hull

Code
# Observations to convex hull
oaks_CH <- st_convex_hull(oaks_occ_sf) 

# plot hulls
ggplot() +
  geom_sf(data = neighbours, color = "white") +
  geom_sf(data = NApoly) +
  geom_sf(data = oaks_CH, aes(fill = Species), alpha = 0.7) +
  scale_fill_scico_d(palette = "davos", direction = -1, end = 0.9, guide = FALSE) +
  coord_sf(
    xlim = c(-125, -65),
    ylim = c(10, 50)
  ) +
  theme(
    plot.background = element_rect(fill = "#f1f2f3"),
    panel.background = element_rect(fill = "lightblue"),
    panel.grid = element_blank(),
    line = element_blank(),
    rect = element_blank()
  )
Warning: The `guide` argument in `scale_*()` cannot be `FALSE`. This was deprecated in
ggplot2 3.3.4.
ℹ Please use "none" instead.

Now try the smoothed version.

Code
# Smoothed convex hulls
oaks_SCH <- st_convex_hull(oaks_occ_sf) %>% 
  smoothr::smooth()

# plot smoothed hulls
ggplot() +
  geom_sf(data = neighbours, color = "white") +
  geom_sf(data = NApoly) +
  geom_sf(data = oaks_SCH, aes(fill = Species), alpha = 0.7) +
  scale_fill_scico_d(palette = "davos", direction = -1, end = 0.9, guide = FALSE) +
  coord_sf(
    xlim = c(-125, -65),
    ylim = c(10, 50)
  ) +
  theme(
    plot.background = element_rect(fill = "#f1f2f3"),
    panel.background = element_rect(fill = "lightblue"),
    panel.grid = element_blank(),
    line = element_blank(),
    rect = element_blank()
  )

Please explain the results. How do you feel about that?

Answer: Out of the seven species, three have very large geographical ranges (Q. virginiana, Q. oleoides, and Q. fusiformis). The other four have small geographical ranges, of which Q. brandegei has the smallest range and is restricted to Baja California Sur, Mexico.

Until here we have explored how to plot, clean and build species geographical ranges using occurrences. Now we will use species geographical ranges of the largest continental endemic radiation (Furnariides) to explore the geographical gradients of species diversity.

3 Diversity gradients

3.1 Prepare data and mapping

The geographical ranges correspond to the Infraorder Furnariides (Aves). This data is available thorough BirdLife International and you can use any other group available on IUCN or BIEN (for plants in the Americas). In any case, you first need to download the polygons in shapefile format.

To load the Furnariides geographical ranges, we will use the function st_read() from the package {sf}. Please read the message printed on your console and try to understand the data.

Code
franges <- st_read("Data/Ranges/Furnarii_ranges_geo.shp") 
Reading layer `Furnarii_ranges_geo' from data source 
  `/Users/jpintole/Documents/GitHub/BiodiversityScience/Spring2024/Lab_2_Macroscale/Data/Ranges/Furnarii_ranges_geo.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 652 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -110.0817 ymin: -55.98222 xmax: -34.78897 ymax: 29.0965
Geodetic CRS:  WGS 84

Explore the imported data.

Code
class(franges)
[1] "sf"         "data.frame"

Now see all the data information.

Code
glimpse(franges)
Rows: 652
Columns: 16
$ SCINAME    <chr> "Acrobatornis fonsecai", "Acropternis orthonyx", "Anabacert…
$ Perimeter  <dbl> 3.889255, 38.861185, 47.712319, 114.669167, 87.950015, 40.7…
$ Area       <dbl> 1.908372e-01, 9.193185e+00, 4.477265e+01, 2.757285e+01, 1.7…
$ AreaKM2    <dbl> 1.908372e+03, 9.193185e+04, 4.477265e+05, 2.757285e+05, 1.7…
$ RD         <dbl> 17, 9, 17, 18, 18, 17, 17, 13, 16, 12, 12, 24, 18, 18, 17, …
$ DR         <dbl> 0.14478, 0.05713, 0.27500, 0.27008, 0.27944, 0.21302, 0.212…
$ BodyMass   <dbl> 13.70, 89.86, 19.20, 24.80, 27.80, 35.40, 39.00, 35.90, 41.…
$ DietInv    <dbl> 100, 80, 100, 100, 100, 100, 100, 100, 80, 100, 100, 100, 5…
$ DietFruit  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ DietSeed   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 20, 0, 0, 0, 50, 0, 0, 0, 50, 0, 0,…
$ StratGroun <dbl> 0, 100, 20, 0, 0, 0, 10, 50, 50, 20, 10, 50, 80, 30, 70, 70…
$ Understory <dbl> 0, 0, 60, 0, 33, 0, 20, 50, 50, 80, 30, 50, 20, 70, 30, 30,…
$ Midhigh    <dbl> 40, 0, 20, 100, 33, 100, 70, 0, 0, 0, 30, 0, 0, 0, 0, 0, 0,…
$ Canopy     <dbl> 60, 0, 0, 0, 33, 0, 0, 0, 0, 0, 30, 0, 0, 0, 0, 0, 0, 0, 0,…
$ Ages       <dbl> 3.260995, 10.994344, 0.262984, 1.358152, 0.262984, 2.492379…
$ geometry   <MULTIPOLYGON [°]> MULTIPOLYGON (((-39.3059 -1..., MULTIPOLYGON (…

What variables are present in the spatial polygon object? and How many species?

Answer: There are 652 species in this spatial object. The variables included are: Perimeter, Area, AreaKM2, RD, DR, BodyMass, DietInv, DietFruit, DietSeed, StratGroun, Understory, Midhigh, Canopy, Ages.

Code
length(unique(franges$SCINAME))
[1] 652
Code
frangesVars <- names(franges)[2:15]

frangesVars
 [1] "Perimeter"  "Area"       "AreaKM2"    "RD"         "DR"        
 [6] "BodyMass"   "DietInv"    "DietFruit"  "DietSeed"   "StratGroun"
[11] "Understory" "Midhigh"    "Canopy"     "Ages"      

Let’s plot a couple of species.

Code
selSPP <- franges %>% 
  filter(SCINAME == "Furnarius rufus" | SCINAME == "Anabazenops dorsalis")

# country subset
SApoly <- worldMap %>% 
  filter(continent == "South America")
Code
# plot the selected ranges
ggplot() +
  geom_sf(data = SApoly) +
  geom_sf(data = selSPP, aes(color = SCINAME), alpha = 0.7, size = 2) +
  scale_fill_scico_d(palette = "davos", direction = -1, 
                     end = 0.9, guide = FALSE) +
  coord_sf(
    xlim = c(-80, -35),
    ylim = c(10, -60)
  ) +
  theme(
    plot.background = element_rect(fill = "#f1f2f3"),
    panel.background = element_rect(fill = "lightblue"),
    panel.grid = element_blank(),
    line = element_blank(),
    rect = element_blank()
  )

Explain the distribution for both species (i.e., Furnarius rufus [blue polygon] and Anabazenops dorsalis [red polygon]) Are these species distributed in sympatry or allopatry? Explain the selected distribution pattern.

Answer: F. rufus is widely distributed mostly in the lowlands of South America (Bolivia, Brazil, Paraguay, Uruguay and Argentina). A. dorsalis is has a more restricted distribution mostly in the Andes (Bolivia, Perú, Ecuador and Colombia). Despite these two species belong to the same lineaje (Family Furnariidae) they present an allopatric distribution

3.2 Raster of species richness

Species richness is the number of different species represented in an ecological community, landscape or region. Species richness is simply a count of species, and it does not take into account the abundances of the species or their relative abundance distributions.

Now, let’s create a map that represent the species richness of Furnariides.

First create an empty raster for the Neotropics using the extent of the Furnariides ranges under a spatial resolution of 1º long-lat or 111 km at the equator.

Code
# Create raster ro store richness values
neo_ras <- rast() # empty raster

ext(neo_ras) <- ext(franges) # Set the raster "extent" 

res(neo_ras) <- 1 # Set the raster "resolution" 

neo_ras # print the raster object in the console
class       : SpatRaster 
dimensions  : 85, 75, 1  (nrow, ncol, nlyr)
resolution  : 1, 1  (x, y)
extent      : -110.0817, -35.08174, -55.98222, 29.01778  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 
Code
values(neo_ras) <- 0 # assign O values to all pixels in the raster

Plot empty raster for the Neotropics

Code
# transform the sf object to a sp object
SApoly_sp <- as(SApoly, "Spatial") 

# Plot empty raster
plot(neo_ras)

plot(SApoly_sp, add = TRUE) ## overlay SA countries to the SR map

Now using the empty raster we will rasterize the species identities in each cell or pixel. The resulting raster will be the species richness of Furnariides across the Neotropics.

Code
f_sr_raster <- terra::rasterizeGeom(x = vect(franges), # species geographical ranges
                         y = neo_ras, # empty raster
                         fun = "count" # count number of species per grid cell
                         )
# this process can take a while depending on your computer (~45 secs in Jesús's computer), please be patient.

Plot the resulting raster.

Code
plot(f_sr_raster)

Let’s try changing the colors using the package {viridis}

Code
# country subset
Apoly <- worldMap %>% 
  filter(continent == "South America" | continent == "North America")

# transform the sf object to a sp object
Apoly_sp <- as(Apoly, "Spatial") 

# Plot the information
plot(mask(f_sr_raster, Apoly), # raster of species richness
     col = viridis::turbo(10), # colors
     zlim = c(min(values(f_sr_raster), 
                  max(values(f_sr_raster)))), 
     main = "Furnariides species richness") 
Warning: [mask] CRS do not match
Code
plot(Apoly_sp, add = TRUE) ## overlay SA countries to the SR map

Or we can try a more fancy way to plot the number of Furnariids’ species. To do that we can use the package {rasterVis} for plotting and the package {RColorBrewer} for selecting color combinations.

Code
library(rasterVis)
Loading required package: lattice
Code
library(RColorBrewer)

# First set a theme
mapTheme <- rasterTheme(region = rev(brewer.pal(11, "Spectral")),
  layout.widths = list(right.padding = 10),
  axis.line = list(col = "transparent"),
  tick = list(col = 'transparent'))

## Now we can plot the raster
p_furna_SR <- levelplot(mask(f_sr_raster, Apoly),
  maxpixels = 1e10,
  margin = FALSE, 
  main = list('Furnariides \n species richness', col = 'darkgray'), 
  par.settings = mapTheme,
  scales = list(x = list(draw = TRUE),
                y = list(draw = TRUE)),
  zlim = c(0, 180))
Warning: [mask] CRS do not match
Code
p_furna_SR

Awesome, right?. Now, please, describe the observed pattern!

Answer: The map shows the classic latitudinal diversity gradient, with a higher accumulation of species close to the equator and a continued decline in the number of species as we approach the South Pole.

3.3 Scale dependency

Now we will explore one of the oldest problems in ecology and evolution, the scale dependency in the data. So to explore this scale dependence, we will rasterize the Furnariides ranges, but using different spatial resolutions from 2º to 6º degrees of long-lat.

Set the empty rasters.

Code
# 2º degrees
neo_ras_2dg <- rast()
# Set the raster "extent" 
ext(neo_ras_2dg) <- ext(franges)
res(neo_ras_2dg) <- 2
neo_ras_2dg
class       : SpatRaster 
dimensions  : 43, 38, 1  (nrow, ncol, nlyr)
resolution  : 2, 2  (x, y)
extent      : -110.0817, -34.08174, -55.98222, 30.01778  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 
Code
values(neo_ras_2dg) <- 0

# 4º degrees
neo_ras_4dg <- rast()
# Set the raster "extent" 
ext(neo_ras_4dg) <- ext(franges)
res(neo_ras_4dg) <- 4
neo_ras_4dg
class       : SpatRaster 
dimensions  : 21, 19, 1  (nrow, ncol, nlyr)
resolution  : 4, 4  (x, y)
extent      : -110.0817, -34.08174, -55.98222, 28.01778  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 
Code
values(neo_ras_4dg) <- 0

# 8º degrees
neo_ras_8dg <- rast()
# Set the raster "extent" 
ext(neo_ras_8dg) <- ext(franges)
res(neo_ras_8dg) <- 8
neo_ras_8dg
class       : SpatRaster 
dimensions  : 11, 9, 1  (nrow, ncol, nlyr)
resolution  : 8, 8  (x, y)
extent      : -110.0817, -38.08174, -55.98222, 32.01778  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 
Code
values(neo_ras_8dg) <- 0

Now, rasterize the species richness to the desired pixel size.

Code
# Furnariides at 2º of long-lat
f_sr_2dg_raster <- terra::rasterizeGeom(x = vect(franges), # species geographical ranges
                         y = neo_ras_2dg, # empty raster
                         fun = "count" # count number of species per grid cell
                         )

# Furnariides at 4º of long-lat
f_sr_4dg_raster <- terra::rasterizeGeom(x = vect(franges), # species geographical ranges
                         y = neo_ras_4dg, # empty raster
                         fun = "count" # count number of species per grid cell
                         )

# Furnariides at 8º of long-lat
f_sr_8dg_raster <- terra::rasterizeGeom(x = vect(franges), # species geographical ranges
                         y = neo_ras_8dg, # empty raster
                         fun = "count" # count number of species per grid cell
                         )

Plot the four maps.

Code
par(mfrow = c (2, 2))

## Map at 1 degree grid cell
plot(mask(f_sr_raster, Apoly), # raster of species richness
     col = viridis::turbo(10), # colors
     zlim = c(min(values(f_sr_raster), 
                  max(values(f_sr_raster)))), 
     main = "Furnariides species richness 1 degree") 
Warning: [mask] CRS do not match
Code
plot(Apoly_sp, add = TRUE) 

## Map at 2 degrees grid cell
plot(mask(f_sr_2dg_raster, Apoly), # raster of species richness
     col = viridis::turbo(10), # colors
     zlim = c(min(values(f_sr_raster), 
                  max(values(f_sr_raster)))), 
     main = "Furnariides species richness 2 degrees") 
Warning: [mask] CRS do not match
Code
plot(Apoly_sp, add = TRUE) 

## Map at 4 degrees grid cell
plot(mask(f_sr_4dg_raster, Apoly), # raster of species richness
     col = viridis::turbo(10), # colors
     zlim = c(min(values(f_sr_raster), 
                  max(values(f_sr_raster)))), 
     main = "Furnariides species richness 4 degrees") 
Warning: [mask] CRS do not match
Code
plot(Apoly_sp, add = TRUE) 

## Map at 8 degrees grid cell
plot(mask(f_sr_8dg_raster, Apoly), # raster of species richness
     col = viridis::turbo(10), # colors
     zlim = c(min(values(f_sr_raster), 
                  max(values(f_sr_raster)))), 
     main = "Furnariides species richness 8 degrees") 
Warning: [mask] CRS do not match
Code
plot(Apoly_sp, add = TRUE) 

Code
#dev.off()

So, is there an effect of spatial scale?

Answer: Yes, with increasing the pixel size the pattern becomes diffuse and the detection of biogeographic patterns.

Explain the differences between the four maps

Answer: Maps at 1 (~111 km) and 2 degrees are similar, showing a high concentration of species in the Amazon rainforest. With pixels of 4 and 8 degrees, regions that previously had low diversity now appear to contain a high number of species.

How do you feel about that?

Answer: The scale of analysis matters, and we should take care of it before making conclusions about the observed biodiversity patterns.

3.4 Correlative relationships

3.4.1 Species richness as a function of evolutionary history

Let’s try to rasterize another information from the polygon data set. We will use the information in the column RD, this data correspond to the numbers of nodes from the tips to the root of a phylogenetic tree or just root distance, thus, will use the RD to calculate the MRD metric (mean root distance) that measures the evolutionary derivedness of species within an assemblage (Kerr & Currie, 1999) and can be used to determine whether a local fauna is constituted primarily by early-diverged or by recently originated species (Hawkins et al., 2012; Pinto-Ledezma et al., 2017). In other words, high MRD values means that the community (i.e., grid-cell) is composed mostly by recently originated species, whereas low MRD values by early-diverged species.

Code
franges
Simple feature collection with 652 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -110.0817 ymin: -55.98222 xmax: -34.78897 ymax: 29.0965
Geodetic CRS:  WGS 84
First 10 features:
                      SCINAME  Perimeter         Area      AreaKM2 RD      DR
1       Acrobatornis fonsecai   3.889255 1.908372e-01 1.908372e+03 17 0.14478
2        Acropternis orthonyx  38.861185 9.193185e+00 9.193185e+04  9 0.05713
3      Anabacerthia amaurotis  47.712319 4.477265e+01 4.477265e+05 17 0.27500
4  Anabacerthia striaticollis 114.669167 2.757285e+01 2.757285e+05 18 0.27008
5  Anabacerthia variegaticeps  87.950015 1.797516e+01 1.797516e+05 18 0.27944
6        Anabazenops dorsalis  40.702820 5.437707e+01 5.437707e+05 17 0.21302
7          Anabazenops fuscus  51.742639 3.231765e+01 3.231765e+05 17 0.21293
8      Ancistrops strigilatus  72.320657 2.051860e+02 2.051860e+06 13 0.10415
9            Anumbius annumbi 116.220827 2.724159e+02 2.724159e+06 16 0.12340
10      Aphrastura masafuerae   0.238199 1.054717e-03 1.054717e+01 12 0.12789
   BodyMass DietInv DietFruit DietSeed StratGroun Understory Midhigh Canopy
1     13.70     100         0        0          0          0      40     60
2     89.86      80         0        0        100          0       0      0
3     19.20     100         0        0         20         60      20      0
4     24.80     100         0        0          0          0     100      0
5     27.80     100         0        0          0         33      33     33
6     35.40     100         0        0          0          0     100      0
7     39.00     100         0        0         10         20      70      0
8     35.90     100         0        0         50         50       0      0
9     41.50      80         0       20         50         50       0      0
10    12.20     100         0        0         20         80       0      0
        Ages                       geometry
1   3.260995 MULTIPOLYGON (((-39.3059 -1...
2  10.994344 MULTIPOLYGON (((-76.0804 2....
3   0.262984 MULTIPOLYGON (((-43.80932 -...
4   1.358152 MULTIPOLYGON (((-73.66269 -...
5   0.262984 MULTIPOLYGON (((-79.76914 -...
6   2.492379 MULTIPOLYGON (((-75.6588 1....
7   2.492379 MULTIPOLYGON (((-45.23222 -...
8   4.731734 MULTIPOLYGON (((-54.28493 -...
9   5.152209 MULTIPOLYGON (((-61.54778 -...
10  0.885548 MULTIPOLYGON (((-80.76715 -...

Rasterize the species’ Root distance to create a map of Mean Root Distance.

Code
f_MRD_raster <- terra::rasterize(vect(franges), # polygons
                          neo_ras, # empty raster
                          field = "RD", # Root distance
                          fun = mean # function mean
                          )

Plot the results

Code
plot(f_MRD_raster, 
     main = 'Furnariides mean root distance')

plot(Apoly_sp, add = TRUE)

Let’s try changing the colors.

Code
## Now we can plot the raster
p_furna_MRD <- levelplot(f_MRD_raster,
  maxpixels = 1e10,
  margin = FALSE, 
  main = list('Furnariides \n mean root distance', col = 'darkgray'), 
  par.settings = mapTheme,
  scales = list(x = list(draw = TRUE),
                y = list(draw = TRUE)),
  zlim = c(0, 25))

p_furna_MRD

Based on the description provided above, please describe the MRD pattern

Answer: The map of mean root distance (MRD) shows an inverse pattern when compared with the patterns of species richness. Specifically, areas/pixels with a high number of species present low MRD, and areas/pixels with a low number of species have high MRD values. This might suggest that areas with low number of co-occurring species are mostly composed by recently-evolved species, conversely, areas with high number of species by species that diverged early in the evolutionary history of the clade.

Let’s plot both raster.

Code
par(mfrow = c(1, 2))

plot(mask(f_sr_raster, Apoly), 
     col = viridis::plasma(10), 
     main = "Furnariides species richness")
Warning: [mask] CRS do not match
Code
plot(mask(f_MRD_raster, Apoly), 
     col = viridis::plasma(10), 
     main = "Furnariides mean root distance")
Warning: [mask] CRS do not match

Code
#dev.off()

Check if there is a relationship between the species richness and the evolutionary derivedness.

Code
cor.test(values(f_sr_raster), values(f_MRD_raster))

    Pearson's product-moment correlation

data:  values(f_sr_raster) and values(f_MRD_raster)
t = -32.945, df = 1644, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.6588512 -0.6005926
sample estimates:
       cor 
-0.6306094 

Or as in the previous lab, we can create a model that explain the association.

Code
obj <- lm(values(f_sr_raster) ~ values(f_MRD_raster))

summary(obj)

Call:
lm(formula = values(f_sr_raster) ~ values(f_MRD_raster))

Residuals:
     Min       1Q   Median       3Q      Max 
-109.581  -13.659   -0.013   13.919  115.372 

Coefficients:
                     Estimate Std. Error t value Pr(>|t|)    
(Intercept)            209.46       4.83   43.37   <2e-16 ***
values(f_MRD_raster)   -10.54       0.32  -32.95   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 25.81 on 1644 degrees of freedom
  (4729 observations deleted due to missingness)
Multiple R-squared:  0.3977,    Adjusted R-squared:  0.3973 
F-statistic:  1085 on 1 and 1644 DF,  p-value: < 2.2e-16
Code
# get pixel values from raster richness
data_sr <- as.data.frame(f_sr_raster, xy = TRUE) 

# get pixel values from raster MRD
data_mrd <- as.data.frame(f_MRD_raster, xy = TRUE)

# Combine both datasets
data_sr_mrd <- left_join(data_sr, data_mrd, by = c("x", "y")) %>% 
  rename(SR = area, MRD = RD) %>% 
  drop_na(MRD)

# Plot the association
data_sr_mrd %>% 
  ggplot(aes(x = MRD, y = SR)) + 
  geom_point(color = "darkgray", size = 3, alpha = 0.5) + 
  geom_smooth(method = "lm")
`geom_smooth()` using formula = 'y ~ x'

Hmmm. What happened in here? Please answer the next questions.

From the mean root distance map, it is possible to explain the Furnariides diversity gradient? If so, please explain from an evolutionary perspective.

Answer: Species-rich areas present low MRD values, and species-poor areas present high MRD values. MRD is a metric that measures deriveness, and high MRD means that assemblages are composed of recently evolved species. This can also be interpreted as these pixels/areas also present higher speciation rates. On the contrary, areas with low MRD will present low rates of speciation, and the higher species richness in these areas is a product of steady but low rates of speciation.

Save the figures

There are multiple options to save the figures. Jesús particularly like saving his figures in PDF. To save the figures in a pdf file, you can use the following code.

pdf(“association_MRD_SR.pdf”, height = 5, width = 7)

data_sr_mrd %>% ggplot(aes(x = MRD, y = SR)) + geom_point(color = “darkgray”) + geom_smooth(method = “lm”)

dev.off()

This lines will save your figure in your working directory.

3.4.2 Species richness as a function of environment

Load the environmental variables that correspond to bio1 (Annual Mean Temperature) and bio12 (Annual Precipitation). These data correspond to two variables out of 19 from WorldClim (http://www.worldclim.org/current). We will use these two variables just for educational purposes, rather to make a complete evaluation of the species-environmental relationships.

Code
# Annual Mean Temperature
bio1 <- rast("Data/BioClim/bio1.bil")
bio1 <- bio1/10

# Annual Precipitation
bio12 <- rast("Data/BioClim/bio12.bil")
bio12
class       : SpatRaster 
dimensions  : 900, 2160, 1  (nrow, ncol, nlyr)
resolution  : 0.1666667, 0.1666667  (x, y)
extent      : -180, 180, -60, 90  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source      : bio12.bil 
name        : bio12 
min value   :     0 
max value   :  9916 

Plot the environmental variables

Code
plot(bio1) 

Code
plot(bio12)

Ok, the bio1 and bio12 layers are at global scale, so now will need to crop them to the extent of the Neotropics.

Code
bio1_neo <- crop(bio1, ext(franges))
bio12_neo <- crop(bio12, ext(franges))
Code
par(mfrow = c(1, 2))

plot(bio1_neo, main = "Annual Mean Temperature", 
     col = rev(viridis::inferno(10)))

plot(bio12_neo, main = "Annual Precipitation", 
     col = rev(viridis::inferno(10)))

Much better!

Obtain the values from bio1, bio12, SR and MRD for each cell or pixel using the coordinates.

Code
# Get environmental data using coordinates from our maps
f_ras_bios <- extract(x = c(bio1_neo, bio12_neo), # environmental variables
                      y = data_sr_mrd[, c("x", "y")]) %>% # coordinates
  rename(MAT = bio1, MAP = bio12)

# Combine the information
fdata <- bind_cols(data_sr_mrd, f_ras_bios)

head(fdata)
          x        y SR MRD ID  MAT MAP
1 -108.5817 28.51778  1  17  1 15.4 720
2 -109.5817 27.51778  2  15  2 24.1 532
3 -108.5817 27.51778  2  16  3 19.6 799
4 -107.5817 27.51778  2  17  4 12.2 745
5 -108.5817 26.51778  2  15  5 24.4 624
6 -107.5817 26.51778  2  17  6 19.8 925

Now make a simple correlation between the Furnariides richness and bio1 and bio12.

Species richness correlated with Temperature

Code
cor.test(fdata$SR, fdata$MAT)

    Pearson's product-moment correlation

data:  fdata$SR and fdata$MAT
t = 23.101, df = 1644, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.4576616 0.5306551
sample estimates:
      cor 
0.4950313 

Species richness correlated with Precipitation

Code
cor.test(fdata$SR, fdata$MAP)

    Pearson's product-moment correlation

data:  fdata$SR and fdata$MAP
t = 35.034, df = 1644, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.6252291 0.6806113
sample estimates:
      cor 
0.6537949 

And also the linear model…

Code
lmbio1 <- lm(SR ~ MAT, data = fdata)

summary(lmbio1)

Call:
lm(formula = SR ~ MAT, data = fdata)

Residuals:
    Min      1Q  Median      3Q     Max 
-64.532 -20.247  -4.611  17.341 136.721 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  -2.9588     2.4722  -1.197    0.232    
MAT           2.5696     0.1112  23.101   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 28.89 on 1644 degrees of freedom
Multiple R-squared:  0.2451,    Adjusted R-squared:  0.2446 
F-statistic: 533.6 on 1 and 1644 DF,  p-value: < 2.2e-16
Code
lmbio12 <- lm(SR ~ MAP, data = fdata)

summary(lmbio12)

Call:
lm(formula = SR ~ MAP, data = fdata)

Residuals:
     Min       1Q   Median       3Q      Max 
-125.633  -13.056   -3.016   12.382  115.190 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 1.429e+01  1.236e+00   11.57   <2e-16 ***
MAP         2.471e-02  7.053e-04   35.03   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 25.16 on 1644 degrees of freedom
Multiple R-squared:  0.4274,    Adjusted R-squared:  0.4271 
F-statistic:  1227 on 1 and 1644 DF,  p-value: < 2.2e-16

Which environmental variable is more related with Furnariides richness?

Answer: Both MAT and MAP are positively associated with the number of species. However, MAP shows a more steeper association than MAT. Also, MAP (R^2 = 0.43) explain more variance than MAT (R^2 = 0.24).

Please explain the relationship from an ecological perspective

Answer: The positive association between MAP and MAT with the number of species suggests that wetter and hotter areas contain more species. This might suggest that these areas present high local heterogeneity, allowing more species to coexist or to accumulate over time.

Code
fdata %>% 
  ggplot(aes(x = MAT, y = SR)) + 
  geom_point(color = "darkgray") + 
  geom_smooth(method = "lm")
`geom_smooth()` using formula = 'y ~ x'

Code
fdata %>% 
  ggplot(aes(x = MAP, y = SR)) + 
  geom_point(color = "darkgray") + 
  geom_smooth(method = "lm")
`geom_smooth()` using formula = 'y ~ x'

3.5 Considering spatial autocorrelation

This paragraph was extracted entirely from (F. Dormann et al., 2007): The analysis of spatial data is complicated by a phenomenon known as spatial autocorrelation. Spatial autocorrelation (SAC) occurs when the values of variables sampled at nearby locations are not independent from each other (Tobler, 1970). The causes of spatial autocorrelation are manifold, but three factors are particularly common: 1) biological processes such as speciation, extinction, dispersal or species interactions are distance‐related; 2) non‐linear relationships between environment and species are modelled erroneously as linear; 3) the statistical model fails to account for an important environmental determinant that in itself is spatially structured and thus causes spatial structuring in the response (Besag, 1974). Since they also lead to autocorrelated residuals, these are equally problematic. A fourth source of spatial autocorrelation relates to spatial resolution, because coarser grains lead to a spatial smoothing of data. In all of these cases, SAC may confound the analysis of species distribution data.

We know that a correlation is not a causation, so, to explore the relationship we need to build a model or fit a model. To explore this relationships we will first explore a simple Ordinary Least Square regression or OLS.

Code
fols <- lm(SR ~ MAT + MAP, data = fdata)

summary(fols)

Call:
lm(formula = SR ~ MAT + MAP, data = fdata)

Residuals:
     Min       1Q   Median       3Q      Max 
-106.713  -13.998   -2.648   11.496  123.965 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.3665430  2.1168885   0.173    0.863    
MAT         0.9307702  0.1159763   8.026 1.91e-15 ***
MAP         0.0208258  0.0008444  24.664  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 24.69 on 1643 degrees of freedom
Multiple R-squared:  0.449, Adjusted R-squared:  0.4484 
F-statistic: 669.6 on 2 and 1643 DF,  p-value: < 2.2e-16

Let’s complicate our model a little bit… Now let’s include the MRD values as covariable (Aka predictor).

Code
fols2 <- lm(SR ~ MAT + MAP + MRD, data = fdata) 

summary(fols2)

Call:
lm(formula = SR ~ MAT + MAP + MRD, data = fdata)

Residuals:
    Min      1Q  Median      3Q     Max 
-74.091 -10.068  -1.527  10.212 125.040 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) 97.6603034  7.3781786  13.236  < 2e-16 ***
MAT          0.7196253  0.1109731   6.485 1.17e-10 ***
MAP          0.0133778  0.0009673  13.830  < 2e-16 ***
MRD         -5.4480900  0.3975883 -13.703  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 23.39 on 1642 degrees of freedom
Multiple R-squared:  0.5056,    Adjusted R-squared:  0.5047 
F-statistic: 559.7 on 3 and 1642 DF,  p-value: < 2.2e-16

What is telling us these two OLS models?

Answer: The first model, which includes MAT and MAP as predictors, suggests that species richness will increase as a function of increasing precipitation and temperature. This model explains 45% of the variation in species richness. It is important to highlight that despite the slopes being different from zero, the intercept does not, so this output should be considered exploratory.

The second model includes MRD as an additional covariable. By including MRD, the variation explained went up to 50%. So, including an evolutionary-informed covariable allowed us to explain better the variation in the number of species that co-occur in the Neotropics.

Now, explore the spatial autocorrelation of the Furnariides richness gradient. Spatial autocorrelation (it can also be temporal or phylogenetic) is a measure of similarity (correlation) between nearby observations. In other words, the spatial autocorrelation describe the degree two which observations (values) at spatial locations (whether they are points, areas, or raster cells), are similar to each other.

Code
autocor_SR <- ncf::correlog(fdata$x, # longitude
                            fdata$y, # latitude
                            z = fdata$SR, # species richness
                            na.rm = TRUE, 
                            increment = 1, 
                            resamp = 1)

Let’s use a correlogram to explore the spatial autocorrelation. Remember, spatial autocorrelation (it can also be temporal or phylogenetic) is a measure of similarity (correlation) between nearby observations. Thus, high values means high spatial autocorrelation.

Code
plot(autocor_SR$correlation[1:50], 
     type = "b", pch = 1, cex = 1.2, lwd = 1.5, ylim = c(-1, 1), 
     xlab = "Distance class", 
     ylab = "Moran's I", 
     cex.lab = 1.2, 
     cex.axis = 1.2)

abline(h = 0)

Is there a spatial autocorrelation in the data? Please explain your answer

Answer: Yes, we can see that close pixels are more similar in terms of the number of species than pixels that are farther apart. Notably, around the distance class 20, the positive spatial autocorrelation disappears.

What about the residuals? Let’s explore the spatial autocorrelation in the residuals.

Code
coords <- fdata[1:2] 

coords <- as.matrix(coords) 

head(coords)
             x        y
[1,] -108.5817 28.51778
[2,] -109.5817 27.51778
[3,] -108.5817 27.51778
[4,] -107.5817 27.51778
[5,] -108.5817 26.51778
[6,] -107.5817 26.51778

Build a neighborhood contiguity by distance. The distance used in this example is 1.5 degrees but you can try with a large distance if you wish to explore more models.

Code
# neighborhood contiguity by distance
nb1.5 <- spdep::dnearneigh(coords, 0, 1.5) 

Using the neighborhood contiguity build spatial weights for neighbor lists.

Code
nb1.5.w <- spdep::nb2listw(neighbours = nb1.5, 
                           glist = NULL, 
                           style = "W", 
                           zero.policy = TRUE)

Extract the residuals from the OLS model

Code
residuals_ols <- residuals(fols2) 

plot(residuals_ols)

Calculate a univariate spatial correlogram.

Code
autocor_ols_res <- ncf::correlog(x = fdata$x, 
                            y = fdata$y, 
                            z = residuals(fols), 
                            increment = 1, 
                            resamp = 1)

Plot the autocorrelagram for the residuals

Code
plot(autocor_ols_res$correlation[1:50], 
     type = "b", pch = 1, cex = 1.2, lwd = 1.5, ylim = c(-0.5, 1), 
     xlab = "distance", 
     ylab = "Moran's I", 
     cex.lab = 1.5, 
     cex.axis = 1.2) 

abline(h = 0)

title(main = "OLS residuals", cex = 1.5)

How many distance classes are necessary to eliminate the spatial autocorrelation in the residuals

Answer: As in the case of species richness, the residuals also present high spatial autocorrelation at short distances, confirming that additional factors or processes are influencing the patterns we observe in nature. In other words, MAP, MAT, and MRD are insufficient to explain the variation in species richness.

Ohhh, seems that the residuals have a strong spatial autocorrelation, that is a problem because if we found autocorrelation in the residuals much of the explanation that we obtain can be biased. See explanation above.

Let’s inspect the two autocorrelograms.

Code
par(mfrow = c(2, 1))

# Plot SR
plot(autocor_SR$correlation[1:50], 
     type = "b", pch = 1, cex = 1.2, 
     lwd = 1.5, ylim = c(-1, 1), 
     xlab = "Distance class", 
     ylab = "Moran's I", 
     cex.lab = 1.2, 
     cex.axis = 1.2)
abline(h = 0)
title(main = "OLS model", cex = 1.5)

# Plot residuals SR
plot(autocor_ols_res$correlation[1:50], 
     type = "b", pch = 1, cex = 1.2, 
     lwd = 1.5, ylim = c(-0.5, 1), 
     xlab = "Distance class", 
     ylab = "Moran's I", 
     cex.lab = 1.5, 
     cex.axis = 1.2)
abline(h = 0)
title(main = "OLS residuals", cex = 1.5)

Hmmm, seems that there is a strong spatial autocorrelation, thus any conclusion using the OLS model can be biased.

How do you feel about that?

Answer: Although simple linear models (OLS) can help us detect patterns to make inferences, we need to go at least one step ahead and explore the spatial structure of the data to better understand the factors and processes that generate the observed data.

To try to solve this important issue, we will use spatial simultaneous autoregressive error model estimation (Aka SAR model), this kind of models account for spatial autocorrelation by adding an extra term (autoregressive) in the form of a spatial-weight matrix that specifies the neighborhood of each cell or pixel and the relative weight of each neighbor.

Let’s fit the SAR model.

Code
sar_nb1.5.w <- spatialreg::errorsarlm(fols2, 
                                      listw = nb1.5.w, 
                                      data = fdata,
                                      quiet = FALSE, 
                                      zero.policy = TRUE, 
                                      na.action = na.exclude)

Spatial autoregressive error model

Jacobian calculated using neighbourhood matrix eigenvalues
Computing eigenvalues ...

lambda: -0.236068  function: -7802.996  Jacobian: -6.146368  SSE: 1254089 
lambda: 0.236068  function: -7213.873  Jacobian: -6.909469  SSE: 612417.1 
lambda: 0.527864  function: -6783.873  Jacobian: -39.39779  SSE: 349135.3 
lambda: 0.7082039  function: -6479.495  Jacobian: -80.34491  SSE: 229491.4 
lambda: 0.8196601  function: -6267.65  Jacobian: -120.206  SSE: 169021.6 
lambda: 0.8885438  function: -6137.484  Jacobian: -155.1728  SSE: 138293.6 
lambda: 0.9311163  function: -6072.505  Jacobian: -184.1433  SSE: 123374.4 
lambda: 0.9574275  function: -6046.676  Jacobian: -207.3018  SSE: 116245.1 
lambda: 0.9736888  function: -6039.709  Jacobian: -225.3863  SSE: 112760.1 
lambda: 0.9820408  function: -6040.039  Jacobian: -236.7272  SSE: 111261.5 
lambda: 0.9768261  function: -6039.456  Jacobian: -229.4254  SSE: 112173.5 
lambda: 0.977008  function: -6039.455  Jacobian: -229.667  SSE: 112140.3 
lambda: 0.9771292  function: -6039.454  Jacobian: -229.8284  SSE: 112118.3 
lambda: 0.9771207  function: -6039.454  Jacobian: -229.8171  SSE: 112119.9 
lambda: 0.9771206  function: -6039.454  Jacobian: -229.8168  SSE: 112119.9 
lambda: 0.9771206  function: -6039.454  Jacobian: -229.8169  SSE: 112119.9 
lambda: 0.9771205  function: -6039.454  Jacobian: -229.8168  SSE: 112119.9 
lambda: 0.9771206  function: -6039.454  Jacobian: -229.8168  SSE: 112119.9 
Code
# this will take a while, ~10 seconds in Jesús's computer
Code
summary(sar_nb1.5.w)

Call:spatialreg::errorsarlm(formula = fols2, data = fdata, listw = nb1.5.w, 
    na.action = na.exclude, quiet = FALSE, zero.policy = TRUE)

Residuals:
     Min       1Q   Median       3Q      Max 
-45.6541  -2.6715  -0.4562   2.1783  52.8991 

Type: error 
Coefficients: (asymptotic standard errors) 
               Estimate  Std. Error z value  Pr(>|z|)
(Intercept) 58.29989009 10.26207282  5.6811 1.338e-08
MAT         -0.32648785  0.09576075 -3.4094  0.000651
MAP          0.00186297  0.00061516  3.0284  0.002458
MRD         -0.76025999  0.27606052 -2.7540  0.005888

Lambda: 0.97712, LR test value: 2966.3, p-value: < 2.22e-16
Asymptotic standard error: 0.0044266
    z-value: 220.74, p-value: < 2.22e-16
Wald statistic: 48724, p-value: < 2.22e-16

Log likelihood: -6039.454 for error model
ML residual variance (sigma squared): 68.117, (sigma: 8.2533)
Number of observations: 1646 
Number of parameters estimated: 6 
AIC: 12091, (AIC for lm: 15055)

Extract the residuals from SAR model

Code
residuals_sar_nb1.5.w <- residuals(sar_nb1.5.w) 

Now estimate the spatial autocorrelation of the SAR model.

Code
autocor_sar_nb1.5.w <- ncf::correlog(x = fdata$x, 
                                     y = fdata$y, 
                                     z = residuals(sar_nb1.5.w), 
                                     na.rm = TRUE, 
                                     increment = 1, 
                                     resamp = 1)

Plot the autocorrelogram under the SAR model.

Code
plot(autocor_sar_nb1.5.w$correlation[1:50], 
     type = "b", pch = 4, cex = 1.2, 
     lwd = 1.5, ylim = c(-0.5, 1), 
     xlab = "distance", 
     ylab = "Moran's I", 
     cex.lab = 1.5, 
     cex.axis = 1.2)

abline(h = 0) 

title(main = "SARerr residuals", cex = 1.5)

Ohhh, where is the autocorrelation in the residuals? Now compare the two autocorrelograms.

Code
par(mfrow = c(2, 1))

plot(autocor_ols_res$correlation[1:50], 
     type = "b", pch = 1, cex = 1.2, 
     lwd = 1.5, ylim = c(-0.5, 1), 
     xlab = "distance", 
     ylab = "Moran's I", cex.lab = 1.5, 
     cex.axis = 1.2)
abline(h = 0)
title(main = "OLS residuals", cex = 1.5)

plot(autocor_sar_nb1.5.w$correlation[1:50], 
     type = "b", pch = 4, cex = 1.2, 
     lwd = 1.5, ylim = c(-0.5, 1), 
     xlab = "distance", 
     ylab = "Moran's I", 
     cex.lab = 1.5, 
     cex.axis = 1.2)
abline(h = 0)
title(main = "SARerr residuals", cex = 1.5)

Ok, now we know that the SAR model can solve the problem in the spatial autocorrelation in the residuals, let’s try to make some inferences.

Code
summary(sar_nb1.5.w)

Call:spatialreg::errorsarlm(formula = fols2, data = fdata, listw = nb1.5.w, 
    na.action = na.exclude, quiet = FALSE, zero.policy = TRUE)

Residuals:
     Min       1Q   Median       3Q      Max 
-45.6541  -2.6715  -0.4562   2.1783  52.8991 

Type: error 
Coefficients: (asymptotic standard errors) 
               Estimate  Std. Error z value  Pr(>|z|)
(Intercept) 58.29989009 10.26207282  5.6811 1.338e-08
MAT         -0.32648785  0.09576075 -3.4094  0.000651
MAP          0.00186297  0.00061516  3.0284  0.002458
MRD         -0.76025999  0.27606052 -2.7540  0.005888

Lambda: 0.97712, LR test value: 2966.3, p-value: < 2.22e-16
Asymptotic standard error: 0.0044266
    z-value: 220.74, p-value: < 2.22e-16
Wald statistic: 48724, p-value: < 2.22e-16

Log likelihood: -6039.454 for error model
ML residual variance (sigma squared): 68.117, (sigma: 8.2533)
Number of observations: 1646 
Number of parameters estimated: 6 
AIC: 12091, (AIC for lm: 15055)
Code
summary(fols2)

Call:
lm(formula = SR ~ MAT + MAP + MRD, data = fdata)

Residuals:
    Min      1Q  Median      3Q     Max 
-74.091 -10.068  -1.527  10.212 125.040 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) 97.6603034  7.3781786  13.236  < 2e-16 ***
MAT          0.7196253  0.1109731   6.485 1.17e-10 ***
MAP          0.0133778  0.0009673  13.830  < 2e-16 ***
MRD         -5.4480900  0.3975883 -13.703  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 23.39 on 1642 degrees of freedom
Multiple R-squared:  0.5056,    Adjusted R-squared:  0.5047 
F-statistic: 559.7 on 3 and 1642 DF,  p-value: < 2.2e-16

By looking to the summary of the SAR and OLS models, explain the differences in the coefficients between both models.

Answer: The first difference is that the coefficients (intercepts and slopes) changed in strength and direction. More specifically, the covariable MAT that was positive now for the OLS model, under the SAR model is negative. The effect of MAP and MRD also decreased.

Let’s compare the slopes for both models in a similar way we did when we compared OLS versus PGLS. Let’s start with mean annual temperature.

Code
fdata %>% 
  ggplot(aes(x = MAT, y = SR)) + 
  geom_point(alpha = 0.5, color = "darkgray") + 
  labs(x = "Mean annual temperature", 
       y = "Species richness") + 
  geom_smooth(method = "lm", # coefficient OLS
              se = FALSE, 
              color = "blue", 
              linewidth = 2) + 
  geom_abline(intercept = coef(sar_nb1.5.w)[2], # coefficients SAR
              slope = coef(sar_nb1.5.w)[3], 
              color = "red", 
              linewidth = 2)
`geom_smooth()` using formula = 'y ~ x'

What about precipitation.

Code
fdata %>% 
  ggplot(aes(x = MAP, y = SR)) + 
  geom_point(alpha = 0.5, color = "darkgray") + 
  labs(x = "Annual precipitation", 
       y = "Species richness") + 
  geom_smooth(method = "lm", # coefficient OLS
              se = FALSE, 
              color = "blue", 
              linewidth = 2) + 
  geom_abline(intercept = coef(sar_nb1.5.w)[2], # coefficients SAR
              slope = coef(sar_nb1.5.w)[4], 
              color = "red", 
              linewidth = 2)
`geom_smooth()` using formula = 'y ~ x'

And finally mean root distance

Code
fdata %>% 
  ggplot(aes(x = MRD, y = SR)) + 
  geom_point(alpha = 0.5, color = "darkgray") + 
  labs(x = "Mean root distance", 
       y = "Species richness") + 
  geom_smooth(method = "lm", # coefficient OLS
              se = FALSE, 
              color = "blue", 
              linewidth = 2) + 
  geom_abline(intercept = coef(sar_nb1.5.w)[2], # coefficients SAR
              slope = coef(sar_nb1.5.w)[5], 
              color = "red", 
              linewidth = 2)
`geom_smooth()` using formula = 'y ~ x'

By inspecting the figures, we can see that sometimes, our conclusions about the factors influencing the patterns of biodiversity can change depending on how the data is analyzed.

The last exercise of this tutorial is to compare the prediction of both models. To calculate a R2 to the SAR model, we will use the function SARr2() from Jesús’s GitHub.

Code
source("https://raw.githubusercontent.com/jesusNPL/BetaDivNA/master/SARr2.R")

Calculate SAR R2

Code
SARr2(Lfull = sar_nb1.5.w$LL[[1]], # log likelihood of the model that includes the spatial weights
      Lnull = sar_nb1.5.w$logLik_lm.model[[1]], # log likelihood null model
      N = nrow(fdata) # number of observation
      ) 
[1] 0.8350546

Comparing the two models (OLS and SAR), please answer the following questions:

1. Which model have the best explanation?

Answer: The SAR model that includes the spatial structure of the data explain 84% of the variation in species richness. In contrast the OLS model only 50%.

2. What can we conclude from these results?

Answer: Although the covariables used in both models (SAR and OLS) help us to explain the variation in species richness, we can confirm that additional factors masked in the spatial structure of the data explain a high amount of variation.

3. Do the slopes (betas) change from the OLS to the SAR? What about the R2?

Answer: Yes, the coefficients and the R^2 changed drastically by taking into account the spatial structure of the data.

4. How do you feel about that?

Answer: Although simple models are useful for data exploration, if our ultimate goal is to make inferences about the processes that generate the data, then models that account for the structure of the data are necessary.

The end! for now…

References

Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society: Series B (Methodological), 36(2), 192–225. https://doi.org/10.1111/j.2517-6161.1974.tb00999.x
Brown, J. H. (2014). Why are there so many species in the tropics? Journal of Biogeography, 41(1), 8–22. https://doi.org/10.1111/jbi.12228
Currie, D. J., Mittelbach, G. G., Cornell, H. V., Field, R., Guegan, J.-F., Hawkins, B. A., Kaufman, D. M., Kerr, J. T., Oberdorff, T., O’Brien, E., & Turner, J. R. G. (2004). Predictions and tests of climate-based hypotheses of broad-scale variation in taxonomic richness. Ecology Letters, 7(12), 1121–1134. https://doi.org/10.1111/j.1461-0248.2004.00671.x
F. Dormann, C., M. McPherson, J., B. Araújo, M., Bivand, R., Bolliger, J., Carl, G., G. Davies, R., Hirzel, A., Jetz, W., Daniel Kissling, W., Kühn, I., Ohlemüller, R., R. Peres-Neto, P., Reineking, B., Schröder, B., M. Schurr, F., & Wilson, R. (2007). Methods to account for spatial autocorrelation in the analysis of species distributional data: A review. Ecography, 30(5), 609–628. https://doi.org/10.1111/j.2007.0906-7590.05171.x
Fine, P. V. A. (2015). Ecological and evolutionary drivers of geographic variation in species diversity. Annual Review of Ecology, Evolution, and Systematics, 46(1), 369–392. https://doi.org/10.1146/annurev-ecolsys-112414-054102
Hawkins, B. A. (2001). Ecology’s oldest pattern? Trends in Ecology & Evolution, 16(8), 470. https://doi.org/10.1016/S0169-5347(01)02197-8
Hawkins, B. A., & Felizola Diniz-Filho, J. A. (2004). “Latitude” and geographic patterns in species richness. Ecography, 27(2), 268–272. https://doi.org/10.1111/j.0906-7590.2004.03883.x
Hawkins, B. A., Field, R., Cornell, H. V., Currie, D. J., Guégan, J.-F., Kaufman, D. M., Kerr, J. T., Mittelbach, G. G., Oberdorff, T., O’Brien, E. M., Porter, E. E., & Turner, J. R. G. (2003). ENERGY, WATER, AND BROAD-SCALE GEOGRAPHIC PATTERNS OF SPECIES RICHNESS. Ecology, 84(12), 3105–3117. https://doi.org/10.1890/03-8006
Hawkins, B. A., McCain, C. M., Davies, T. J., Buckley, L. B., Anacker, B. L., Cornell, H. V., Damschen, E. I., Grytnes, J.-A., Harrison, S., Holt, R. D., Kraft, N. J. B., & Stephens, P. R. (2012). Different evolutionary histories underlie congruent species richness gradients of birds and mammals: Bird and mammal richness gradients. Journal of Biogeography, 39(5), 825–841. https://doi.org/10.1111/j.1365-2699.2011.02655.x
Kerr, J. T., & Currie, D. J. (1999). The relative importance of evolutionary and environmental controls on broad-scale patterns of species richness in north america. Écoscience, 6(3), 329–337. https://doi.org/10.1080/11956860.1999.11682546
O’Brien, E. M. (2006). Biological relativity to water?energy dynamics. Journal of Biogeography, 33(11), 1868–1888. https://doi.org/10.1111/j.1365-2699.2006.01534.x
Pianka, E. R. (1966). Latitudinal gradients in species diversity: A review of concepts. The American Naturalist, 100(910), 33–46. https://doi.org/10.1086/282398
Pinto-Ledezma, J. N., Simon, L. M., Diniz-Filho, J. A. F., & Villalobos, F. (2017). The geographical diversification of furnariides: The role of forest versus open habitats in driving species richness gradients. Journal of Biogeography, 44(8), 1683–1693. https://doi.org/10.1111/jbi.12939
Ricklefs, R. E. (2004). A comprehensive framework for global patterns in biodiversity. Ecology Letters, 7(1), 1–15. https://doi.org/10.1046/j.1461-0248.2003.00554.x
Rohde, K. (1992). Latitudinal gradients in species diversity: The search for the primary cause. Oikos, 65(3), 514. https://doi.org/10.2307/3545569
Simpson, G. G. (1964). Species density of north american recent mammals. Systematic Biology, 13(1), 57–73. https://doi.org/10.2307/sysbio/13.1-4.57
Tobler, W. R. (1970). A computer movie simulating urban growth in the detroit region. Economic Geography, 46, 234. https://doi.org/10.2307/143141